Introduction¶
Hi, I’m Aman, a Master’s student in Life Science Informatics. This is my deep learning project, where I am implementing a Convolutional Neural Network (CNN) using Keras to identify cells infected by malaria.
The dataset for training and evaluation is publicly available on Kaggle — it contains microscopic images of both parasitized and uninfected cells, enabling the model to learn distinguishing features through image classification.
This project aims to:
**Explore the fundamentals of deep learning in biomedical image analysis.**
**Gain hands-on experience with CNN architectures for real-world healthcare applications.**
**Understand data preprocessing, augmentation, and model optimization to improve classification accuracy.**
Through this project, I am bridging my background in life sciences with my growing expertise in machine learning and artificial intelligence, opening pathways to more advanced research in medical diagnostics and bioinformatics.
Modules and Utils¶
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D , MaxPooling2D , Dense , Flatten , Dropout
from tensorflow.keras.metrics import Precision,BinaryAccuracy,Recall
from tensorflow.keras import layers , models
import matplotlib.pyplot as plt
import numpy as np
import cv2
import imghdr
import os
C:\Users\Aman Yadav\AppData\Local\Temp\ipykernel_27312\536576340.py:9: DeprecationWarning: 'imghdr' is deprecated and slated for removal in Python 3.13 import imghdr
Loading and filtering Data from main dataset¶
data_dir = "data"
image_exts =["jpg" , "jpeg" , "png" , "bmp"]
img = cv2.imread(os.path.join(data_dir , "Parasitized" , "C100P61ThinF_IMG_20150918_144104_cell_162.png"))
img.shape
(148, 142, 3)
Categories ('Parasitized', 'Uninfected')¶
os.listdir(data_dir)
['Parasitized', 'Uninfected']
tf.keras.utils.image_dataset_from_directory??
Signature: tf.keras.utils.image_dataset_from_directory( directory, labels='inferred', label_mode='int', class_names=None, color_mode='rgb', batch_size=32, image_size=(256, 256), shuffle=True, seed=None, validation_split=None, subset=None, interpolation='bilinear', follow_links=False, crop_to_aspect_ratio=False, pad_to_aspect_ratio=False, data_format=None, verbose=True, ) Source: @keras_export( [ "keras.utils.image_dataset_from_directory", "keras.preprocessing.image_dataset_from_directory", ] ) def image_dataset_from_directory( directory, labels="inferred", label_mode="int", class_names=None, color_mode="rgb", batch_size=32, image_size=(256, 256), shuffle=True, seed=None, validation_split=None, subset=None, interpolation="bilinear", follow_links=False, crop_to_aspect_ratio=False, pad_to_aspect_ratio=False, data_format=None, verbose=True, ): """Generates a `tf.data.Dataset` from image files in a directory. If your directory structure is: ``` main_directory/ ...class_a/ ......a_image_1.jpg ......a_image_2.jpg ...class_b/ ......b_image_1.jpg ......b_image_2.jpg ``` Then calling `image_dataset_from_directory(main_directory, labels='inferred')` will return a `tf.data.Dataset` that yields batches of images from the subdirectories `class_a` and `class_b`, together with labels 0 and 1 (0 corresponding to `class_a` and 1 corresponding to `class_b`). Supported image formats: `.jpeg`, `.jpg`, `.png`, `.bmp`, `.gif`. Animated gifs are truncated to the first frame. Args: directory: Directory where the data is located. If `labels` is `"inferred"`, it should contain subdirectories, each containing images for a class. Otherwise, the directory structure is ignored. labels: Either `"inferred"` (labels are generated from the directory structure), `None` (no labels), or a list/tuple of integer labels of the same size as the number of image files found in the directory. Labels should be sorted according to the alphanumeric order of the image file paths (obtained via `os.walk(directory)` in Python). label_mode: String describing the encoding of `labels`. Options are: - `"int"`: means that the labels are encoded as integers (e.g. for `sparse_categorical_crossentropy` loss). - `"categorical"` means that the labels are encoded as a categorical vector (e.g. for `categorical_crossentropy` loss). - `"binary"` means that the labels (there can be only 2) are encoded as `float32` scalars with values 0 or 1 (e.g. for `binary_crossentropy`). - `None` (no labels). class_names: Only valid if `labels` is `"inferred"`. This is the explicit list of class names (must match names of subdirectories). Used to control the order of the classes (otherwise alphanumerical order is used). color_mode: One of `"grayscale"`, `"rgb"`, `"rgba"`. Whether the images will be converted to have 1, 3, or 4 channels. Defaults to `"rgb"`. batch_size: Size of the batches of data. Defaults to 32. If `None`, the data will not be batched (the dataset will yield individual samples). image_size: Size to resize images to after they are read from disk, specified as `(height, width)`. Since the pipeline processes batches of images that must all have the same size, this must be provided. Defaults to `(256, 256)`. shuffle: Whether to shuffle the data. Defaults to `True`. If set to `False`, sorts the data in alphanumeric order. seed: Optional random seed for shuffling and transformations. validation_split: Optional float between 0 and 1, fraction of data to reserve for validation. subset: Subset of the data to return. One of `"training"`, `"validation"`, or `"both"`. Only used if `validation_split` is set. When `subset="both"`, the utility returns a tuple of two datasets (the training and validation datasets respectively). interpolation: String, the interpolation method used when resizing images. Supports `"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`, `"lanczos3"`, `"lanczos5"`, `"gaussian"`, `"mitchellcubic"`. Defaults to `"bilinear"`. follow_links: Whether to visit subdirectories pointed to by symlinks. Defaults to `False`. crop_to_aspect_ratio: If `True`, resize the images without aspect ratio distortion. When the original aspect ratio differs from the target aspect ratio, the output image will be cropped so as to return the largest possible window in the image (of size `image_size`) that matches the target aspect ratio. By default (`crop_to_aspect_ratio=False`), aspect ratio may not be preserved. pad_to_aspect_ratio: If `True`, resize the images without aspect ratio distortion. When the original aspect ratio differs from the target aspect ratio, the output image will be padded so as to return the largest possible window in the image (of size `image_size`) that matches the target aspect ratio. By default (`pad_to_aspect_ratio=False`), aspect ratio may not be preserved. data_format: If None uses keras.config.image_data_format() otherwise either 'channel_last' or 'channel_first'. verbose: Whether to display number information on classes and number of files found. Defaults to `True`. Returns: A `tf.data.Dataset` object. - If `label_mode` is `None`, it yields `float32` tensors of shape `(batch_size, image_size[0], image_size[1], num_channels)`, encoding images (see below for rules regarding `num_channels`). - Otherwise, it yields a tuple `(images, labels)`, where `images` has shape `(batch_size, image_size[0], image_size[1], num_channels)`, and `labels` follows the format described below. Rules regarding labels format: - if `label_mode` is `"int"`, the labels are an `int32` tensor of shape `(batch_size,)`. - if `label_mode` is `"binary"`, the labels are a `float32` tensor of 1s and 0s of shape `(batch_size, 1)`. - if `label_mode` is `"categorical"`, the labels are a `float32` tensor of shape `(batch_size, num_classes)`, representing a one-hot encoding of the class index. Rules regarding number of channels in the yielded images: - if `color_mode` is `"grayscale"`, there's 1 channel in the image tensors. - if `color_mode` is `"rgb"`, there are 3 channels in the image tensors. - if `color_mode` is `"rgba"`, there are 4 channels in the image tensors. """ if labels not in ("inferred", None): if not isinstance(labels, (list, tuple)): raise ValueError( "`labels` argument should be a list/tuple of integer labels, " "of the same size as the number of image files in the target " "directory. If you wish to infer the labels from the " "subdirectory " 'names in the target directory, pass `labels="inferred"`. ' "If you wish to get a dataset that only contains images " f"(no labels), pass `labels=None`. Received: labels={labels}" ) if class_names: raise ValueError( "You can only pass `class_names` if " f'`labels="inferred"`. Received: labels={labels}, and ' f"class_names={class_names}" ) if label_mode not in {"int", "categorical", "binary", None}: raise ValueError( '`label_mode` argument must be one of "int", ' '"categorical", "binary", ' f"or None. Received: label_mode={label_mode}" ) if labels is None or label_mode is None: labels = None label_mode = None if color_mode == "rgb": num_channels = 3 elif color_mode == "rgba": num_channels = 4 elif color_mode == "grayscale": num_channels = 1 else: raise ValueError( '`color_mode` must be one of {"rgb", "rgba", "grayscale"}. ' f"Received: color_mode={color_mode}" ) if isinstance(image_size, int): image_size = (image_size, image_size) elif not isinstance(image_size, (list, tuple)) or not len(image_size) == 2: raise ValueError( "Invalid `image_size` value. Expected a tuple of 2 integers. " f"Received: image_size={image_size}" ) interpolation = interpolation.lower() supported_interpolations = ( "bilinear", "nearest", "bicubic", "area", "lanczos3", "lanczos5", "gaussian", "mitchellcubic", ) if interpolation not in supported_interpolations: raise ValueError( "Argument `interpolation` should be one of " f"{supported_interpolations}. " f"Received: interpolation={interpolation}" ) dataset_utils.check_validation_split_arg( validation_split, subset, shuffle, seed ) if seed is None: seed = np.random.randint(1e6) image_paths, labels, class_names = dataset_utils.index_directory( directory, labels, formats=ALLOWLIST_FORMATS, class_names=class_names, shuffle=shuffle, seed=seed, follow_links=follow_links, verbose=verbose, ) if label_mode == "binary" and len(class_names) != 2: raise ValueError( 'When passing `label_mode="binary"`, there must be exactly 2 ' f"class_names. Received: class_names={class_names}" ) data_format = standardize_data_format(data_format=data_format) if batch_size is not None: shuffle_buffer_size = batch_size * 8 else: shuffle_buffer_size = 1024 if subset == "both": ( image_paths_train, labels_train, ) = dataset_utils.get_training_or_validation_split( image_paths, labels, validation_split, "training" ) ( image_paths_val, labels_val, ) = dataset_utils.get_training_or_validation_split( image_paths, labels, validation_split, "validation" ) if not image_paths_train: raise ValueError( f"No training images found in directory {directory}. " f"Allowed formats: {ALLOWLIST_FORMATS}" ) if not image_paths_val: raise ValueError( f"No validation images found in directory {directory}. " f"Allowed formats: {ALLOWLIST_FORMATS}" ) train_dataset = paths_and_labels_to_dataset( image_paths=image_paths_train, image_size=image_size, num_channels=num_channels, labels=labels_train, label_mode=label_mode, num_classes=len(class_names) if class_names else 0, interpolation=interpolation, crop_to_aspect_ratio=crop_to_aspect_ratio, pad_to_aspect_ratio=pad_to_aspect_ratio, data_format=data_format, shuffle=shuffle, shuffle_buffer_size=shuffle_buffer_size, seed=seed, ) val_dataset = paths_and_labels_to_dataset( image_paths=image_paths_val, image_size=image_size, num_channels=num_channels, labels=labels_val, label_mode=label_mode, num_classes=len(class_names) if class_names else 0, interpolation=interpolation, crop_to_aspect_ratio=crop_to_aspect_ratio, pad_to_aspect_ratio=pad_to_aspect_ratio, data_format=data_format, shuffle=False, ) if batch_size is not None: train_dataset = train_dataset.batch(batch_size) val_dataset = val_dataset.batch(batch_size) train_dataset = train_dataset.prefetch(tf.data.AUTOTUNE) val_dataset = val_dataset.prefetch(tf.data.AUTOTUNE) # Users may need to reference `class_names`. train_dataset.class_names = class_names val_dataset.class_names = class_names # Include file paths for images as attribute. train_dataset.file_paths = image_paths_train val_dataset.file_paths = image_paths_val dataset = [train_dataset, val_dataset] else: image_paths, labels = dataset_utils.get_training_or_validation_split( image_paths, labels, validation_split, subset ) if not image_paths: raise ValueError( f"No images found in directory {directory}. " f"Allowed formats: {ALLOWLIST_FORMATS}" ) dataset = paths_and_labels_to_dataset( image_paths=image_paths, image_size=image_size, num_channels=num_channels, labels=labels, label_mode=label_mode, num_classes=len(class_names) if class_names else 0, interpolation=interpolation, crop_to_aspect_ratio=crop_to_aspect_ratio, pad_to_aspect_ratio=pad_to_aspect_ratio, data_format=data_format, shuffle=shuffle, shuffle_buffer_size=shuffle_buffer_size, seed=seed, ) if batch_size is not None: dataset = dataset.batch(batch_size) dataset = dataset.prefetch(tf.data.AUTOTUNE) # Users may need to reference `class_names`. dataset.class_names = class_names # Include file paths for images as attribute. dataset.file_paths = image_paths return dataset File: c:\users\aman yadav\anaconda3\lib\site-packages\keras\src\utils\image_dataset_utils.py Type: function
Total Images
Distributing the images in 32 Batches which shuffeling enabled
In Keras, batching images reduces memory usage by processing only a portion of the dataset at once instead of loading it entirely.(because we have more tham 2000 images). It speeds up computation because GPUs and TPUs are optimized for handling multiple images in parallel. Batching also stabilizes training by averaging gradients over several samples, leading to smoother and more reliable learning. Additionally, it allows flexibility in tuning batch size, which can influence model convergence speed and generalization ability.
data = tf.keras.utils.image_dataset_from_directory(data_dir)
data_iterator = data.as_numpy_iterator()
batch = data_iterator.next()
Found 27558 files belonging to 2 classes.
batch[0].shape # -> images
(32, 256, 256, 3)
batch[1] # -> Labels
array([0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 1, 0, 1])
We Must Visualize the Image to check which number in our Y array belongs to which label
fig , ax = plt.subplots(ncols = 4 , figsize = (15 , 14))
for index , img in enumerate(batch[0] [:4]):
ax[index].imshow(img.astype(int))
ax[index].title.set_text(batch[1][index] )
#Class 0 = Infected / parasitic
#class 1 = uninfected
Data Preprocessing¶
Normalization of our images¶
Normalizing images in a CNN is important because it keeps pixel values within a consistent range, which makes training more stable and faster. It prevents the model from being biased toward brighter or darker images and helps activation functions like ReLU or tanh work efficiently. Normalization also improves generalization, allowing the CNN to learn features that perform well on new, unseen data.
Here I have used map function to Normalize each Image. Notice that "y" is left as it is, Thats our labels.
data = data.map(lambda x , y : (x / 255 , y ))
normalized_iter = data.as_numpy_iterator()
print(f"Max Pixel Value {normalized_iter.next()[0].max()} ") #-> higest pixel
print(f"Min Pixel Value {normalized_iter.next()[0].min()} ") # ->lowest pixel
Max Pixel Value 0.9660042524337769 Min Pixel Value 0.0
batch = normalized_iter.next()
fig , ax = plt.subplots(ncols = 4 , figsize = (15 , 14))
for index , img in enumerate(batch[0] [:4]):
ax[index].imshow(img)
ax[index].title.set_text(batch[1][index] )
#Class 0 = Infected / parasitic
#class 1 = uninfected
batch[1] # -> labels
array([0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0,
0, 1, 1, 0, 1, 1, 0, 0, 1, 1])
Data Splitting¶
print(f" Total batches {(len(data))} ") # - > total batches
print(f" batch shape {batch[0].shape}") # - > images in in one batch and channels
total_images = len(data) * batch[0].shape[0]
print(f" Total Images {total_images}")
Total batches 862 batch shape (32, 256, 256, 3) Total Images 27584
Splitting data In 70% training , 20% Val and 10% testing
training_images = int((len(data)* 0.7))
testing_images = int((len(data)* 0.1))
validation_images = int((len(data)* 0.2))
print(training_images + testing_images + validation_images)
print(training_images ,testing_images , validation_images )
861 603 86 172
train test and val partition¶
take(n): Selects the first n elements from a dataset.
skip(n): Skips the first n elements and returns the rest.
Used together, they help slice or subset a dataset efficiently in TensorFlow.
train = data.take(training_images)
test = data.skip(training_images).take(validation_images)
val = data.skip(validation_images + training_images).take(testing_images)
print(f"Training data total {len(train )}")
print(f"Testing data total {len(test)} ")
print(f"Validation data total {len(val)} ")
Training data total 603 Testing data total 172 Validation data total 86
Deep Learning Model¶
Training¶
Here I am using a Sequential CNN model where layers are stacked one after another.
The model has three convolutional layers with a kernel size of 3x3 to extract features from the input images.
Each convolutional layer is followed by a MaxPooling2D layer to reduce the spatial dimensions and keep important features.
After the convolutional layers, the Flatten layer converts the 3D feature maps into a 1D vector.
The Dense layers process these features to learn complex patterns and make predictions.
The final layer uses a sigmoid activation to output a probability for binary classification (infected vs. healthy cells).
cnn = models.Sequential([
#cnn
#conv layer 1
layers.Conv2D(filters = 32 , kernel_size= (3,3) , activation= 'relu' , input_shape=(256 ,256 ,3)),
layers.MaxPooling2D((2,2)),
#conv layer 2
layers.Conv2D(filters = 32, kernel_size= (3,3) , activation= 'relu' ),
layers.MaxPooling2D((2,2)),
#conv layer 3
layers.Conv2D(filters = 16, kernel_size= (3,3) , activation= 'relu' ),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dense(1, activation='sigmoid' )
])
C:\Users\Aman Yadav\anaconda3\Lib\site-packages\keras\src\layers\convolutional\base_conv.py:107: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs)
cnn.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
cnn.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 254, 254, 32) │ 896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 127, 127, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_1 (Conv2D) │ (None, 125, 125, 32) │ 9,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 62, 62, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_2 (Conv2D) │ (None, 60, 60, 16) │ 4,624 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_2 (MaxPooling2D) │ (None, 30, 30, 16) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten (Flatten) │ (None, 14400) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 256) │ 3,686,656 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 1) │ 257 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 3,701,681 (14.12 MB)
Trainable params: 3,701,681 (14.12 MB)
Non-trainable params: 0 (0.00 B)
hist = cnn.fit(train, epochs=3 , validation_data= val)
Epoch 1/3 603/603 ━━━━━━━━━━━━━━━━━━━━ 281s 463ms/step - accuracy: 0.7096 - loss: 0.5460 - val_accuracy: 0.9364 - val_loss: 0.1987 Epoch 2/3 603/603 ━━━━━━━━━━━━━━━━━━━━ 257s 426ms/step - accuracy: 0.9367 - loss: 0.2025 - val_accuracy: 0.9353 - val_loss: 0.1782 Epoch 3/3 603/603 ━━━━━━━━━━━━━━━━━━━━ 253s 419ms/step - accuracy: 0.9474 - loss: 0.1688 - val_accuracy: 0.9408 - val_loss: 0.1791
print(hist.history.keys())
dict_keys(['accuracy', 'loss', 'val_accuracy', 'val_loss'])
Model is performing very well Actually
Model Evaluation¶
import plotly.graph_objects as go
acc = hist.history['accuracy']
val_acc = hist.history['val_accuracy']
epochs = list(range(1, len(acc)+1))
fig = go.Figure()
fig.add_trace(go.Scatter(x=epochs, y=acc, mode='lines+markers', name='Train Accuracy'))
fig.add_trace(go.Scatter(x=epochs, y=val_acc, mode='lines+markers', name='Validation Accuracy'))
fig.update_layout(
title='Training vs Validation Accuracy',
xaxis_title='Epoch',
yaxis_title='Accuracy',
template='plotly_dark'
)
fig.show()
loss = hist.history['loss']
val_loss = hist.history['val_loss']
epochs = list(range(1, len(acc)+1))
fig = go.Figure()
fig.add_trace(go.Scatter(x=epochs, y=loss , mode='lines+markers', name='Train loss'))
fig.add_trace(go.Scatter(x=epochs, y=val_loss, mode='lines+markers', name='Validation loss'))
fig.update_layout(
title='Training vs Validation loss',
xaxis_title='Epoch',
yaxis_title='loss',
template='plotly_dark'
)
fig.show()
pre = Precision()
re = Recall()
acc= BinaryAccuracy()
len(test)
172
Predicting all 172 testing images in the batch¶
for batch in test.as_numpy_iterator():
X,y = batch
yPred = cnn.predict(X)
pre.update_state(y , yPred)
re.update_state(y , yPred)
acc.update_state(y , yPred)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 200ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 119ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 112ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 127ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 112ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 112ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 118ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 111ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 110ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 114ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 111ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 115ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 116ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 114ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 118ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 124ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 117ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 114ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 114ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 102ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 101ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 117ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 113ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 119ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 117ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 111ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 102ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 101ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 112ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 114ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 110ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 110ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 120ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 115ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 118ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 119ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 119ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 123ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 122ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 124ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 122ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 116ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 117ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 118ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 118ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 115ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 114ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 111ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 113ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 118ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 114ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 112ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 111ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 101ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 113ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 113ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 101ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 99ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 102ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 111ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 101ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 102ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 113ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 108ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 109ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 103ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 105ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 104ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 106ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 101ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 110ms/step
print(f"precision {pre.result().numpy()}")
print(f"recall {re.result().numpy()}")
print(f"accuracy {acc.result().numpy()}")
precision 0.9373424649238586 recall 0.9462013840675354 accuracy 0.9414970874786377
Testing¶
Testing Image 1¶
img_test = cv2.imread("infected testing.png")
img_test = cv2.cvtColor(img_test, cv2.COLOR_BGR2RGB)
plt.imshow(img_test)
plt.axis("off")
resized = tf.image.resize(img_test, (256, 256))
plt.imshow(resized.numpy().astype("uint8"))
plt.axis("off")
plt.show()
print(resized.shape)
predictor = cnn.predict(np.expand_dims(resized /255 , 0)) # -> Normalizing the testing image (important step)
print(predictor)
if predictor > 0.5 :
print ("Cell is not infected ")
else :
print("cell is infected ")
(256, 256, 3) 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step [[0.05020722]] cell is infected
Testing Image 2¶
img_test2 = cv2.imread("31-researchersm.jpg")
img_test2 = cv2.cvtColor(img_test2, cv2.COLOR_BGR2RGB)
plt.imshow(img_test2)
plt.axis("off")
resized2 = tf.image.resize(img_test2, (256, 256))
plt.imshow(resized2.numpy().astype("uint8"))
plt.axis("off")
plt.show()
print(resized2.shape)
predictor = cnn.predict(np.expand_dims(resized2 /255 , 0))
print(predictor)
if predictor > 0.5 :
print ("Cell is not infected ")
else :
print("cell is infected ")
(256, 256, 3) 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 65ms/step [[0.9415034]] Cell is not infected
The trained model can now predict images of RBCs and identify whether they are infected or not.¶
Uses and Benefits of the Model:¶
Malaria Detection: Automatically identifies RBCs infected by the malaria parasite.
Faster Diagnosis: Reduces the time needed for manual microscopic examination of blood samples.
High Accuracy: Detects subtle signs of infection that might be missed by human observers.
Scalable Screening: Can process large volumes of blood smear images quickly, useful in hospitals and remote clinics.
Decision Support for Clinicians: Assists doctors and lab technicians by providing preliminary screening results.
Research Applications: Useful for studying malaria prevalence, treatment effectiveness, and infection trends.
Resource Efficiency: Reduces dependence on highly trained personnel for routine screening in high-risk areas.